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ABSTRACT 

The history of the Educational Testing Service (ETS) 
Factor Kits is summarized. The original ETS Factor Kit was developed 
in 1954 and contained 51 items, three each for each of 15 factors and 
six for a 16th factor. The next edition was developed in 1963 and 
included adaptations (clones) of the defining tests instead of the 
exact copies. These tests marked 24 factors. The current edition of 
the ETS Factor Kit was developed in 1976 and consists of 72 tests 
marking 23 cognitive factors. Some limitations of paper-and-pencil 
versions of the kits are identified, and computer-administered 
versions being developed are described. Information is given about a 
study comparing computer and paper-and-pencil tests. The Factor Kit 
tests were intended to be used as markers in factor-analytic studies 
of cognition and have been widely used in psychological research. 
Tests that could be used to determine a number of major factors were 
assembled in "kits" for factorial research. Limitations of the format 
restricted the kinds of cognitive processes that could be assessed 
and the ways in which tests could be scored. Questions of test misuse 
arose. Creating computer-administered versions posed a number of 
problems in the areas of timing, confirmation and correction of 
responses, and pacing. For the computer administered versions, system 
features are described. A small pilot study compared the two formats 
using data for 30 secondary school students aged 13 to 19 years, who 
tooK part of each kit of 10 tests in each format. Results suggest 
that the factors measured by these 10 tests were not affected by the 
use of the computer version. Versions of the computer-administered 
kit for field testing are anticipated in 1992. Three tables provide 
details about the 1954, 1963, and 1976 editions of the Factor Kits. A 
17-item list of references is included. (SLD) 
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COMPUTER-BASED ASSESSMENT OF COGNITION: THE ETS FACTOR KIT 

Ruth B. Ekstrom and Isaac I. Bejar 
Educational Testing Service 

For more than 35 years the ETS Factor Kits (French, 1954; 
French, Ekstrom & Price, 1963; Ekstrom, French & Harman, 1976) 
have provided researchers with tests of cognitive processes such 
as reasoning, memory, verbal ability, and spatial ability. The 
Kit tests were intended to be used as markers in factor-analytic 
studies of cognition. 

The Kit tests are widely used in psychological research. 
According to the Social Sciences Cit ation Index there have been 
over 400 published studies citing the Kits from 1972 through 
1988. Recent studies include the factor structure of the ASVAB 
battery (Augustin, Gillet, Guerrero & Curran, 1989) ; verbal and 
visual learning styles (Kirby, Moore & Schofield, 1988); 
hemispheric differences in components of mental rotation (Fischer 
& Pellegrino, 1988) ; performance on competing tasks (Fogarty & 
Stankov, 1988); familial resemblances in cognitive abilities 
(AbdPlrahim, Nagoshi, Johnson & Vandenberg, 1988); reasoning and 
language proficiency (Boyle, 1987); the effects of cognitive 
training on mental-ability structure (Schaie, Willis, Hertzog & 
Schulenberg, 1987); and video-game performance (Jones, Dunlap & 
Bilodeau, 1986) . 

This paper summarizes the history of the Kits; identifies 
some limitations of the paper-and-pencil editions; describes the 
computer-administered version of the Kit, now being developed at 
ETS; and presents information about a small study comparing 
computer-administered and paper-and-pencil tests. 

History of the Kits 

At the beginning of the 1950s, factor analysis was seen as 
an emerging technology with the potential to achieve order out of 
the hodgepodge of aptitude and achievement tests then available. 
To that end John W. French produced a monograph. The Description 
of Aptitude and Ach j ^vement Tests in Terms of Rotated Factors, 
"devoted to the progress of test development toward the situation 
where the test constructor has a file of tests to measure each 
factor of the mind" (French, 1951, page v) . This monograph 
featured re-interpretation of factors and their identification 
across studies. Its wide acceptance led French to the idea of 
putting together, in a "Kit", several tests that could be 
expected to determine a number of major factors. 

^ The orig inal Kit . The first Kit, entitled the K it of 

^ &e lect e d Tests fnr Referen ce Aptitude av ri Achievement Factors 

(French, 1954) was published in 1954. It consisted of 51 tests, 
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three for each of 15 factors and six for a sixteenth factor. (A 
list of the factors and tests in this Kit appears in Table i.) 
For each test, a name and a symbol, linking the test to a factor 
were provided. A manual provided a description of each test, a' 
Key, information about time limits and appropriate grade levels 
and information about how to obtain a copies of the test or the' 
requirements for reproducing it. (The authors of all of these 
tests had agreed they could be reproduced for research purposes.) 
The manual did not provide reliability, validity or norming 
information stating that this was not appropriate since the tests 
were "suggested for the single purpose of factorial research." 
Kit users were asked to provide French with information from' 
research studies so it could be shared. 

. h Kit with "Clpnes". By 1958 it became apparent that a 
revised Kit was needed. New research, especially work of 
Guilford in the area of divergent production, had identified 
additional factors. Other factors needed re-conceptualization or 
different marker tests. It also had become apparent that some 
abuses of the Kit were occurring, usually involving unauthorized 
reproduction of the tests without the copyright owner's 
permission. It was decided that the new Kit would use 
adaptations or "clones" of the defining tests, whenever the 
copyright owners agreed, instead of exact copies. The 
adaptations made it possible to give the new Kit tests a 
relatively uniform format and directions that were as parallel as 
possible. The adaptation also involved producing two separately 
timed parallel parts, both for administrative convenience and to 
facilitate the estimation of test reliability. Blanket 
permission for reproduction was given to the adapted tests 
created at ETS and for tests copyrighted by J. P. Guilford; tests 
copyrighted by Sheridan Supply Company had to be purchased froir, 
that source. Small scale studies were done to obtain correlation 
matrices to see if the new adapted tests for a factor held 
together but, because of financial limitations, no factor 
analysis of the entire set of tests was undertaken. A list of the 
factors and tests in the 1963 Kit of Reference Tests for 
Cognitive Factpr? (French, Ekstrom and Price) appears in Table 2. 

The Current Ut- By 1971 it became apparent, once again, 
that It was time to revise the Kit. A review of the literature 
suggested that at least six additional factors were sufficiently 
well-established to warrant inclusion in a new Kit (Ekstrom, 
1973). "Established" was defined as a factor having appeared in 
at least three different studies done by at least two different 
researchers or research laboratories. Carroll's "Psychometric 
tests as cognitive tasks: A new 'Structure of Intellect'", also 
informed the work; it was published as one of the technical 
reports from the revision project (Carroll, 1974). This revision 
involved more experimental work and field tryouts of the tests 
than had been done earlier (Ekstrom, French & Harman, 1979). 
Because of persistent problems with the unauthorized reproduction 
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of the Kit tests a process of licensing test use was instituted. 
The 1976 edition of the Kit (Ekstrom, French i Harman, 1976) 
consisted of 72 tests marking 23 cognitive factors (See Table 3) . 

Problems and Limitations 

The paper-and-pencil format of the earlier editions of the 
ETb Factor Kit restricted the kinds of cognitive processes that 
could be assessed. It also made it impossible to separate speed 
and level-of-accuracy (Carroll, 1988) . For the researcher, the 
paper-and-pencil format meant that hand scoring of responses was 
necessary and that the results then had to be entered into a data 
base for analysis. 

A second limitation has been that no factor-analysis has 
been conducted using the entire set of tests in any Kit. 
Consequently, the relationship between the factors has been 
inferred from limited data. A recent study (Wothke et al, 1990) 
included all factors in the 1976 Kit but the design, using only 
two tests for each factor, led to an underestimate of the number 
of factors. (Defining a factor by two nearly identical tests 
will, typically, lead to approximately one-third to one-fourth 
too few factors with roots greater than one.) 

An on-going concern has been the extent to which the adapted 
tests are adequate stand-ins for the original research 
instruments. The development of the "clone" tests relied on the 
construct validity of the factors and on the editors' knowledge 
of cognition. One study of the tests for five spatial factors 
(Ekstrom, 1967) concluded that most of the tests in the 1963 Kit 
were similar enough to the originals in the 1954 Kit to load on 
the same factor. An exception was the Hidden Figures Test, 
created by Witkin and his colleagues to measure field dependence- 
independence and included in the 1963 Kit as a marker for 
flexibility of closure; this test appears to be primarily a 
measure of visualization, although it does have some variance on 
flexibility of closure. A recent re-analysis of this study by 
Carroll (personal communication) shows similar results. Other re- 
analyses by Carroll, using a hierarchical methodology, show that 
some of the tests in the 1954 Kit are more factorially complex 
than was originally thought. 

Finally, there has been concern over the use of the Kit 
tests in ways never intended by the authors. The Kits were 
created to facilitate research in cognition by factor-analytic 
methods. The Kit tests were selected because they had been used 
in previous factor-analytic studies, were short and easy to 
administer, and the authors were willing to have them reproduced 
for research use or adapted. These tests were never considered by 
the Kit editors to be the best or defining measures of these 
aspects of cognition; they are merely the ones that met the Kit 
requirements. As was pointed out in the manual for the 1976 Kit, 
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"There are probably no such things as truly "pure" factors." 
(Ekstroin, French and Harman, 1976, p. 4). Despite this and other 
caveats, there has been an increased use of Kit tests in a 
variety of studies including, in addition to psychological 
research, neurological, physiological, and genetic research. It 
may be appropriate to use Kit tests to identify the kinds of 
cognitive processes affected by head injuries or exposure to 
toxics. It is Iftss clear that the Kit tests should be used to 
"prove" the relationship between hormonal levels and certain 
abilities or to demonstrate that certain components of cognition 
have a hereditary component. In addition, there has been concern 
that the Kit has provided a consensus about abilities significant 
for research (Cronbach, 1984) , thus tending to limit rather than 
stimulate factor-analytic studies of cognition. 

At the 1952 conference, which lead to the creation of the 
first Kit, a number of points were made that are important to 
remember today. Dorothy Adkins voiced concern that continued use 
of the same tests to define a factor might lead to the 
perpetuation of mistakes. Harold Bechtoldt commented that ""best 
test* is a poor concept; a test merely measures." French pointed 
out that the tests being considered for the Kit were not ideal 
and that, because their selection was heavily influenced by 
considerations about brevity and availability to ETS, their use 
should be limited to research (French, 1952). 

All of this points out the need for more research with the 
Kit tests, both to understand more about the constructs called 
"cognitive factors" and to further our understanding of cognitive 
processes. To this end, ETS is developing a new computer- 
administered version of the Kit. 

Creating a Computer-Administered Kit 

There were three basic problems involved in creating 
computer-administered versions of the Kit tests: 1) How to change 
test format without changing the required cognitive processes; 2) 
How to keep the tests as similar as possible to the paper-and- 
pencil versions while, at the same time, making use of the 
advantages of computer administration; and 3) How to design the 
new Kit to facilitate research that will add to our understanding 
of cognitive processes, especially the relative contributions of 
speed and power. 

Among the issues that have been considered are those 
involving timing, confirmation and correction of responses and 
pacing. 

Timing Issues . Although most computer-administered tests 
have chosen not to limit testing or response time, we decided to 
do so. We reasoned that, since there are time limits on paper- 
and-pencil tests, removing time restrictions entirely might 
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change subjects' response strategies and alter the meaning of the 
results. We have not established a time limit for an entire test 
but we have limited the time for each item. The default response 
time has been set high enough, however, to allow subjects to 
ponder over some of the more difficult items. Researchers will 
have, for each item, a record of response latency as well as 
response correctness. The preliminary computer-administered 
version allows researchers to determine time for the initial 
response and, in addition, time for changing and/or confirming 
responses. We are have decided to include timing switches that 
can be set by the researcher to increase data collection 
flexibility and to allow a variety of approaches to the analysis 
of response time. 

Confirmation and Response Correction Issues . Although the 
norm in cognitive experiments on computers seems to be to require 
no confirmation of responses, computerized psychometric 
instruments typically require such confirmation to allow subjects 
to change their answers. Since one goal was to make our 
computer-administered tests as much like the paper-and-pencil 
originals as possible, we decided to require confirmation of 
responses and to permit changing answers on all but the most 
highly speeded tests. 

Pacing Issues . Another question was whether to allow 
subjects to pace themselves and regulate the speed at which new 
items appear or to have the pace of administration controlled by 
the computer. Again, the with goal of keeping the computer- 
administered tests as much like paper-and-pencil tests as 
possible, we decided to have the items self-paced on all but the 
most highly speeded tests. On the speeded tests, subjects will 
be alerted before item presentation by a "beep'' and, as indicated 
above, no confirmation or changing of responses will be involved. 

With these issues in mind, we moved to the design of the 
system. 

System Features . The minimum system configuration for the 
new computer-administered Kit is an IBM-compatible computer with 
256k of memory, a graphics adaptor, and two floppy disks. Thus a 
relatively inexpensive computer can be used. In a networking 
environment, several students can be tested simultaneously. 

The computer not only records subjects' responses for each 
item but also provides scoring for most tests, thus doing away 
with the antiquated hand scoring process. In addition, a data 
base will be created enabling researchers to go directly from 
data collection to data analysis without the necessity of 
manually entering the responses into a computer. 

The program also includes features that will allow the 
researcher to assign different tests to different subjects and to 
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vary the order of testing from subject to subject. With some 
additional programming by the researcher, it will be possible to 
decide on the basis of previous tests, which test should be 
administered next. The feature of having two parallel and 
separately timed parts for each test facilitates pre-post 
studies, such as changes in cognitive processes as a result of an 
intervening experience. 

System Components . The major components of the system are 
the Test Delivery System, the Kit Tests, the Test Administration 
System, and the Permanent Database Facility. The relationship 
among these is shown in Figure 1. 

The Test Administration System (TAS) is the program through 
which the researcher designs and monitors the testing. This 
system prepares the files needed by the Test Delivery System. 
These files are: 1) ID. CRD - the list of subjects and the test 
administration design to which each has been assigned; 2) 
DESIGNxx.CRD - One or more files containing data collection 
designs (the last two characters of the file name uniquely 
identify the design); 3) PATHS. CARD - containing the location of 
certain files; and 4) LICENSE. CRD - which will count the number 
of tests processed by the system. The researcher can query the 
Test Administration System to determine the status of subjects 
and the remaining number of licensed copies of each test. Once 
the tests have been administered through the Test Delivery 
System, the Test Administration system collects data from the 
response files and converts them to an ASCII file in preparation 
for transferring the data to the Permanent Database Facility. 

The Test Administration system contains a separate program 
for each test. This was done because, in the Factor Kit, test 
stimuli, responses and scoring differ so much from one test to 
another that it would have been difficult to devise a 
sufficiently general program and item bank to handle all of the 
tests. Despite the fact that there is a separate program for 
each test, the flow of information is essentially identical. 
However, the item banks differ across tests as do the response 
and scoring modules. While there is a separate executable 
program for each test, a single file (RESP.CRD) is use to hold 
the responses from all of the tests. 

The Test Delivery System (TDS) interacts with the subject. 
Once the researcher has designed the data collection, a proctor 
can test a subject simply by running the batch file TDSBAT.BAT. 
This batch file, in turn, runs three programs: 1) VIDEO, which 
loads the appropriate graphic drivers for the computer; 2) TDS 
itself, which "plays" the lines in the DESIGNxx.CRD; and 
3)VIDE0FF, which unloads the graphic drivers. TDS asks each 
subject for identification number and proceeds if this is 
correct. TDS does not administer the tests but, rather, runs the 
programs called for by DESIGNxx.CRD which do the actual 
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administration. Thus, DESIGNxx.CRD is a script of programs that 
are to be "played". These programs need not all be Kit tests. 
Other tests can be included or training programs can be presented 
between pre- and post-tests. As a rule, the script implicit in 
DESIGNxx.CRD is static; that is, all tests and the order they 
will be given in is fixed. However, dynamic data collection 
systems are possible by having an external test or program modify 
the DESIGNxx.CRD. 

The Permanent Database Facility will manage the storage of 
data and create data matrices for analysis. 

Flexible Timing. Confirmation and Pacing Models . Computer 
administration provides not only the possibility of recording 
response time but, also, the advantage of flexibility in 
administration design. As indicated earlier, we used this 
advantage to solve the question of which of several timing, 
confirmation and pacing models to choose. A researcher can 
choose one of four models; confirmation with or without pacing 
and no confirmation with or without pacing. In regard to timing, 
we decided to record all keystrokes, from initial to final, and 
to make it possible for researchers who are interested in only a 
sub-set of these response elements to select those of concern. 

Comparability 

Our work with the Kit provides an opportunity to study how 
format change affects a very diverse group of tests. In the 
summer of 1989 Scott Hershberger, a pre-doctoral summer fellow at 
ETS, conducted a small pilot study of ten Kit tests in both 
paper-and-pencil and computer-administered format. 

The tests were measures of the induction, general reasoning, 
and verbal comprehension factors. All of these tests are in 
multiple-choice format. Time limits applied to the pencil-and- 
paper mode but the computer administration was untimed. The 
subjects were 30 secondary school students, ages 13 to 19. 
Testing order was counter-balanced by format but not by factor. 
The tests were always administered in the same order. Group 1 
took Part 1 of each test by computer, directly followed by Part 2 
in paper-and-pencil format; Group 2 took Part 1 in the paper-and- 
pencil format and Part 2 by computer. 

The results strongly suggest that the factors measured by 
these ten tests are not affected by the use of computer- 
administered tests. Due to the small sample size, Hotelling's T^ 
test of group differences could not be used. Therefore, multiple 
t-tests were condujted between Groups 1 and 2 for each part of 
each test, with the Bonferroni correction to control the family- 
wise error rate. Uniformly, no significant mean differences were 
found based on mode of administration. 
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This study also explored whether the difficulties of 
individual items changed across the two modes of administration. 
In order to compare the relative difficulty of each of the items 
in each format on the ten tests, the difficulty of each item on 
each test was computed. Each of the item difficulties was 
transformed into a delta by multiplying the difficulty index's 
normal curve equivalent by 4 and then adding 13. For each of the 
two parts of each test, deltas computed from the scores of Group 
1 were correlated with deltas computed from the scores of Group 
2. In no case was the correlation between deltas below .61 and, 
most commonly, the correlations were above .80. These results 
are all the more surprising when one considers that the paper- 
and-pencil tests might have been more difficult because of the 
time limits imposed on the subjects. 

The subjects wero also asked which test administration mode 
they preferred. Without exception, every examinee voiced a 
preference for the computer-administered format. Many of the 
subjects from it easier to respond on the computer and felt their 
performance would be correspondingly better. However, despite 
this perception, neither subjects' mean level of performance nor 
individual item difficulties varied significantly between the two 
formats. 

Availability 

Information about the 1976 Factor Kit (paper-and-pencil ) is 
available from: E. Mingo, Educational Testing Service 05-R, 
Princeton, NY 08541. A complete Kit, containing all 72 tests and 
a manual, can be purchased for $30.00. Licensing agreements for 
the use of specific tests are also available at 10 cents per copy 
reproduced, with a minimum charge of $50. ($35. for graduate 
students) . 

We anticipate having portions of the computer-administered 
Kit available for field-testing in 1991. Individuals interested 
in the computer-administered Kit or participating in the field 
test should contact: R. Ekstrom, Educational Testing Service 09- 
R, Princeton, NY 08541. 
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Table 1 



Content of the 1954 Kit of Selected Tests for 
Reference Aptitude and Achievement Factors 



Factor 



Aiming 



Flexibility of 
Closure 



Tests 

Dotting 
Tracing Easy 
Tracing Difficult 

Concealed Figures 

Designs 
Copying 

Gestalt Completion 

Mutilated Words 
Four-Letter Words 
False Premises 
Reasoning 
Word Squares 
Letter Grouping 
Marks 

Raven Progressive 

Matrices 
Topics 

Theme 

Things 



Associative Memory Picture-Number 

Word-Number 
First Names 
Mechanical Knowledge Tool Information 

Automotive Info. 



Speed of Closure 

Deduction 
Induction 

Ideational Fluency 



Motor Speed 



Number Facility 



General Reasoning 



Spatial Relations 
and Orientation 



Author (s) 

Adapted from MacQuarrie 
•I II II 

II II 11 

Thurstone (Adaptation 
of Gottschaldt Figures) 
Thurstone 

Thurstone (Adaptation 

of test by MacQuarrie) 

Thurstone (Adaptation 

of Street Gestalt) 

Thurstone 

Thurstone 

Thurstone 

Thurstone 

Adkins and Lyerly 

Thurstone 

Thurstone 



Adapted by Taylor from 
Cattell 

Adapted by Taylor from 
Cattell 

Adapted by Taylor from 
Cattell 

Adapted from a test by 

Anastasi 

Thurstone 

Thurstone 

Guil ford-Zimmerman 
II II 



Mechanical Info. 
Writing X's 
Writing "lack" 
Writing digits 
Addition 
Division 
Subtraction and 

Multiplication 
Mathematical Aptit. 
General Reasoning 
Ship Destination 

Cards 
Cubes 

Spatial Orientation 



ACE Psychological Exam. 
Guil ford-Zimmerman 
Christensen & Guilford 

Thurstone 
Thurstone 

Guil ford-Zimmerman 
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Speed of Symbol 
Discrimination 



Verbal Knowledge 



Visualization 
Word Fluency 



Letter "A" 
First Digit 

Cancellation 
Scattered X's 
Vocabulary 

Vocabulary 

Wide Range 

Vocabulary Test 
Advanced Vocabulary 
Advanced Vocabulary 

Advanced Vocabulary 
Form Board 
Punched Holes 
Surface Development 
Suffixes 
Prefixes 
First and Last 
Letters 



Thurstone 

Thurstone 
Thurstone 

Adapted from a test by 
Carroll 
Adapted from 
Cooperative Vocab. Test 



M 



II 
II 



II 
II 



Adapted from a test by 

Carroll 

Thurstone 

Thurstone 

Thurstone 

Thurstone 

Thurstone 

Thurstone 

Thurstone 




Table 2 



Factors in the Kit of Reference Tests 
for Cognitive Factors (1963) 

Flexibility of Closure 
Speed of Closure 
Associational Fluency * 
Expressional Fluency * 
Ideational Fluency 
Word Fluency 
Induction 

Length Estimation * 
Associative (Rote) Memory 
Mechanical Knowledge 
Memory Span * 
Number Facility 
Originality * 
Perceptual Speed ^ 
General Reasoning 
Semantic Redefinition * 
Syllogistic Reasoning ' 
Spatial Orientation 
Sensitivity to Problems * 
Spatial Scanning * 
Verbal Comprehension 
visualization 

Figural Adaptive Flexibility * 
Semantic Spontaneous Flexibility * 

New factor since 1954 Kit 

Called speed of symbol discrimination in 1954 Kit 
Called deduction in 1954 Kit 

Called spatial relations and orientation in 1954 Kit 
Called verbal knowledge in 1954 Kit 
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Table 3 



Factors in the 1976 Kit of 
Factor-Pef erenced Cognitive Tests 

Flexibility of Closure ' 
Speed of Closure 
Verbal Closure ' 
Adaptive Flexibility * 
Expressional Fluency ^ 
Figural Fluency ' 
Ideational Fluency 
Word Fluency * 
Induction 

Integrative Process ' 
Associative Memory 
Memory Span 
Visual Memory ' 
Number Facility ' 
Perceptual Speed ' 
General Reasoning ' 
Logical Reasoning** 
Spatial Orientation ^ 
Spatial Scanning 
Verbal Comprehension 
Visualization 
Figural Flexibility 
Flexibility of Use " 



Modified test(s) 
New teGt(s) 

New factor and new tests 
Factor name changed/modified 
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